Deep Motif: Visualizing Genomic Sequence Classifications

نویسندگان

  • Jack Lanchantin
  • Ritambhara Singh
  • Zeming Lin
  • Yanjun Qi
چکیده

This paper applies a deep convolutional/highway MLP framework to classify genomic sequences on the transcription factor binding site task. To make the model understandable, we propose an optimization driven strategy to extract “motifs”, or symbolic patterns which visualize the positive class learned by the network. We show that our system, Deep Motif (DeMo), extracts motifs that are similar to, and in some cases outperform the current well known motifs. In addition, we find that a deeper model consisting of multiple convolutional and highway layers can outperform a single convolutional and fully connected layer in the previous state-of-the-art.1

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks

Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) ...

متن کامل

Genetic Architect: Discovering Genomic Structure with Learned Neural Architectures

Each human genome is a 3 billion base pair set of encoding instructions. Decoding the genome using deep learning fundamentally differs from most tasks, as we do not know the full structure of the data and therefore cannot design architectures to suit it. As such, architectures that fit the structure of genomics should be learned not prescribed. Here, we develop a novel search algorithm, applica...

متن کامل

Functional motifs in Escherichia coli NC101

Escherichia coli (E. coli) bacteria can damage DNA of the gut lining cells and may encourage the development of colon cancer according to recent reports. Genetic switches are specific sequence motifs and many of them are drug targets. It is interesting to know motifs and their location in sequences. At the present study, Gibbs sampler algorithm was used in order to predict and find functional m...

متن کامل

Prototype Matching Networks for Large-Scale Multi-label Genomic Sequence Classification

One of the fundamental tasks in understanding genomics is the problem of predicting Transcription Factor Binding Sites (TFBSs). With more than hundreds of Transcription Factors (TFs) as labels, genomic-sequence based TFBS prediction is a challenging multi-label classification task. There are two major biological mechanisms for TF binding: (1) sequence-specific binding patterns on genomes known ...

متن کامل

Cloning and molecular characterization of TaERF6, a gene encoding a bread wheat ethylene response factor

Ethylene response factor proteins are important for regulating gene expression under different stresses. Different isoforms for ERF have previously isolated from bread wheat (Triticum aestivum L.) and related genera and called from TaERF1 to TaERF5. We isolated, cloned and molecular characterized a novel one based on TdERF1, an isoform in durum wheat (Tri...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1605.01133  شماره 

صفحات  -

تاریخ انتشار 2016